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Abstract The genomes of transmissible gastroenteritis virus (TGEV) and mouse hep¬ 
atitis virus (MHV) have been generated with a novel construction strategy that al¬ 
lows for the assembly of very large RNA and DNA genomes from a panel of contigu¬ 
ous cDNA subclones. Recombinant viruses generated from these methods contained 
the appropriate marker mutations and replicated as efficiently as wild-type virus. 
The MHV cloning strategy can also be used to generate recombinant viruses that 
contain foreign genes or mutations at virtually any given nucleotide. MHV molecular 
viruses were engineered to express green fluorescent protein (GFP), demonstrating 
the feasibility of the systematic assembly approach to create recombinant viruses ex¬ 
pressing foreign genes. The systematic assembly approach was used to develop an 
infectious clone of the newly identified human coronavirus, the serve acute respira¬ 
tory syndrome virus (SARS-CoV). Our cloning and assembly strategy generated an 
infectious clone within 2 months of identification of the causative agent of SARS, 
providing a critical tool to study coronavirus pathogenesis and replication. The 
availability of coronavirus infectious cDNAs heralds a new era in coronavirus genet¬ 
ics and genomic applications, especially within the replicase proteins whose func¬ 
tions in replication and pathogenesis are virtually unknown. 
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1 

Introduction 

Molecular analysis of the structure and function of RNA virus genomes 
has been profoundly advanced by the availability of full-length cDNA 
clones, the source of infectious RNA transcripts that replicate efficiently 
when introduced into permissive cell lines (Boyer and Haenni 1994). 
Coronaviruses contain the largest single-stranded, positive-polarity 
RNA genome of about 30 kb (Cavanagh et al. 1997; de Vries et al. 1997; 
Eleouet et al. 1995). Until recently, coronavirus genetic analysis has been 
limited to analysis of temperature-sensitive (ts) mutants (Fu and Baric 
1992, 1994; Lai and Cavanagh 1997; Schaad and Baric 1994; Stalcup et 
al. 1998), defective interfering (DI) RNAs (Izeta et al. 1999; Narayanan 
and Makino 2001; Repass and Makino 1998; Williams et al. 1999), and 
recombinant viruses generated by targeted recombination (Fischer et al. 
1997; Hsue and Masters 1999; Kuo et al. 2000). Among these, targeted 
recombination is the seminal approach developed to systematically as¬ 
sess the function of individual mutations in the 3 / -most ~10 kb of the 
MHV genome. Methods to assemble an MHV full-length infectious con¬ 
struct have been hampered by the large size of the genome, the regions 
of chromosomal instability, and the inability to synthesize full-length 
transcripts (Almazan et al. 2000; Masters 1999; Yount et al. 2000). This is 
especially problematic within the group 2 coronavirus replicase, where 
several regions of chromosomal toxicity and instability have hampered 
the development of infectious cDNAs. Full-length infectious constructs 
will allow for the systematic dissection of the structure and function of 
each viral gene, the phenotypic consequences of gene rearrangement on 
virus replication and pathogenesis, the development of coronavirus het¬ 
erologous gene expression systems, and a clearer understanding of the 
transcription and replication strategy of the Coronaviridae. In this re¬ 
port, we review strategies for building coronavirus infectious cDNAs by 
using mouse hepatitis virus strain A59 as a model. 


2 

The Coronavirus Genome 

The coronavirus genome, a single-stranded RNA, is the largest viral 
RNA genome known to exist in nature (27.6-31.3 kb). Genomic RNAs 
have a 5' terminal cap and a 3' terminal poly (A) tail. In addition, a lead¬ 
er sequence of 65-98 nucleotides and a 200- to 400-base pair untranslat¬ 
ed region are located at the 5' terminus, whereas a 200- to 500-base pair 
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untranslated region is located at the 3' terminus. The 5' most two-thirds 
of the genome encodes the replicase gene in two open reading frames 
(ORFs), la and lb, the latter of which is expressed by ribosomal 
frameshifting (Almazan et al. 2000; Eleouet et al. 1995). Like many other 
positive-sense RNA viruses, the coronavirus replicase is translated as a 
large precursor polyprotein that is processed by viral proteinases, giving 
rise to ~15 replicase proteins. The functions of most of the coronavirus 
replicase proteins are unknown. However, based on nucleotide sequence 
homology and empirical studies, identifiable functions include two pa¬ 
painlike cysteine proteases, a chymotrypsin-like 3C protease, a cysteine- 
rich growth factor-related protein, an RNA-dependent RNA polymerase, 
a nucleoside triphosphate (NTP)-binding/helicase domain, and a zinc- 
finger nucleic acid-binding domain (Enjuanes et al. 2000a; Penzes et al. 
2001; Siddell 1995). Most of the replicase gene products colocalize with 
replication complexes at sites of RNA synthesis on internal membranes. 
However, a spectrum of genetically informative mutations have not been 
systematically targeted to any of these replicase proteins, so we have lit¬ 
tle insight into the organization of the replicase complex and the loca¬ 
tion of functional motifs, which regulate transcription, replication, and 
RNA recombination. Because of the extremely rich milieu of molecular 
reagents that are available against the replicase proteins, the availability 
of a molecular clone of MHV allows for the first time a systematic genet¬ 
ic analysis of gene 1 function in coronavirus replication. 


3 

Systematic Approaches to Assembling Coronavirus cDNAs 
from a Panel of Contiguous Subclones 

Coronavirologists have seized on several different strategies to build in¬ 
fectious cDNA clones. However, all were primarily designed to circum¬ 
vent problems associated with the large size of the coronavirus genome, 
regions of chromosomal instability, and other problems associated with 
the production of full-length infectious transcripts (Almazan et al. 2000; 
Masters 1999; Yount et al. 2000). Our solution was to assemble infectious 
cDNAs from a panel of contiguous subclones that spanned the entire 
length of the TGEV and MHV genomes. Each subclone was flanked by 
unique restriction sites with characteristics that allow for the systematic 
and precise assembly of a full-length cDNA with in vitro ligation. For 
this strategy to be efficient, restricted subclone fragments had to be in¬ 
capable of self-concatemer formation and not spuriously assemble with 
other noncontiguous subclones. 



232 


R.S. Baric • A.C. Sims 


Conventional class II restriction enzymes, such as EcoRl, leave identi¬ 
cal sticky ends that assemble with similarly cut DNA in the presence of 
DNA ligase (Pingoud and Jeltsch 2001; Sambrook et al. 1989). Because 
these enzymes leave identical compatible ends, digested fragments ran¬ 
domly self-assemble into large concatamers and, therefore, they are poor 
choices for assembling large intact genomes or chromosomes. However, 
a second group of class II restriction enzymes (i.e., Bgll, BstX I, Sfll) also 
recognize a symmetrical sequence but leave random sticky ends 1-4 nu¬ 
cleotides in length, and consequently, restrict assembly cascades along 
specific pathways (Table 1). For example, the type II restriction enzyme, 
Bgll, recognizes the symmetrical sequence GCCNNNN^NGGC and 
cleaves a random DNA sequence on average every -4,096 base pairs. Be¬ 
cause 64 different 3-nucleotide overhangs can be generated, DNA frag- 


Table 1 . Selected restriction enzymes used in assembly of recombinant full-length gen¬ 
omes 


Restric¬ 

tion 

enzyme a 

Recognition site 

No. of 
variable 
sticky end 

Average 

cutting 

frequency 13 

Actual frequency 
of compatible 
ends b 

Bgll 

GCCNNNNjNGGC 

CGGN|NNNNCCG 

3 nt/64 

potential ends 

-4,096 nt 

-261,344 nt 

BstX I 

CCANNNNN jNTGG 

GGTN | NNNNNACC 

4 nt/256 
potential ends 

-4,096 nt 

-1,045,376 nt 

Sfll 

GGCCNNNNjNGGCC 

CCGGNTNNNNCCGG 

3 nt/64 

potential ends 

-65,536 nt 

-4,194,304 nt 

Sapl 

GCTCTTCNjNNNN 

CGAGAAGNNNNN j 

3 nt/64 

potential ends 

-16,385 nt (in 
either strand) 

-1,048,640 nt* 

Aarl 

C ACCTGCNNNN jNNNN 

GTGGACGNNNNNNNNT 

4 nt/256 
potential ends 

-16,385 nt (in 
either strand) 

-4,194,304 nt* 

Esp3l 

(BsraBI) 

CGTCTCNjNNNN 

4 nt/256 
potential ends 

-4,096 nt (in 
either strand) 

-1,048,576 nt* 


GCAGAGNNNNN | 


a Other enzymes leaving many different overhangs: BsmVl, EclHkl, Fokl, Mboll, 
Tthllll, Ahdl, Drdl, BspMl, Bsm AI, Bcgl, BmRl, Bpml, Bsal, Bse I, Earl, PfiMl, BstV2, 
VpaK32l, Abel, Ppil. 

b Assuming a totally random DNA sequence; ^asymmetric cutters like Sapl, Aarl 
and Esp3l can have recognition sites in either strand of DNA so actual site frequen¬ 
cy is -1/2 of indicated values and can be engineered as “no-see-um” (Yount et al. 
2002 ). 
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ments will only assemble with the appropriate 3-nucleotide complemen¬ 
tary overhang generated at an identical Bgll restriction site. As a result, 
identical ends are generated every -264,000 base pairs, providing a pow¬ 
erful means for the construction of very large DNA and RNA genomes. 
Consonant with these findings, the type IIS restriction enzyme, Esp3l, 
recognizes an asymmetric sequence and makes a staggered cut 1 and 
5 nucleotides downstream of the recognition sequence, leaving 256, 
mostly asymmetrical, 4-nucleotide overhangs (GCTCTCN^NNNN). As 
identical Esp3l sites are generated every -1,000,000 base pairs or so in a 
random DNA sequence, most restricted fragments usually do not self-as- 
semble (Yount et al. 2002). Rather, specific recursive assembly pathways 
can be designed that hypothetically allow assembly of >1 million base 
pair DNA genomes (-2 fragments) (Table 1). We took advantage of 
several unique properties inherent in type II restriction enzymes to 
build coronavirus infectious cDNAs. 

Initially, we isolated five cDNA subclones spanning the entire TGEV 
genome (designated TGEV A, B, C, D/E, and F) by RT-PCR using primers 
that introduced unique Bgll restriction sites at the 5' and 3' ends of each 
fragment without altering the amino acid coding sequences of the virus 
(Table 2). The TGEV A, C, DE, and F clones were stable in plasmid DNAs 
in Escherichia coli. The B fragment, however, was unstable, containing 
deletions or insertions in the wild-type sequence at a region of instabili¬ 
ty in the TGEV genome noted by other investigators (Almazan et al. 
2000; Eleouet et al. 1995). To prevent fragment instability, we used prim¬ 
er-mediated mutagenesis to bisect the B fragment at the unstable site 
with an adjoining BstX I (CCATTCAC^TTGG) site, resulting in TGEV B1 
and TGEV B2 amplicons (Fig. 1; Table 2). It is likely that sequences 


Table 2. Design of TGEV junction sequences 


Restriction site junction Location 


Junction 


S'-GCCTGTT^TGGC-S' 

3'-CGGA r CAAACCG-5' 

Bgll, nt 6,159 

A-Bl 

5'-CCATTCAC l TTGG-3' 

3'-GGTA r AGTGAACC-5' 

BstXl, nt 9,949 

B1-B2 

5'-GCCGCAT l TGGC-3' 

3'-CGGC T GTAGCCG-5' 

Bgll, nt 11,355 

B2-C 

S'-GCCTTCT^TGGC-S' 

3'-CGGA r AGAACCG-5' 

Bgll, nt 16,595 

C-D/El 

5'-GCCGTGC l AGGC-3' 

3'-CGGC T ACGTCCG-5' 

Bgll, nt 23,487 

D/El-F 
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(9600-9950) in and around the TGEV 3C like protease (3CL pi °) motif 
are either bactericidal or unstable in microbial vectors (Almazan et al. 
2000; Yount et al. 2000). The resulting 6 fragments, TGEV A, Bl, B2, C, 
D/E, and F, were ligated in vitro to generate a full-length cDNA of the 
TGEV genome (Fig. 1). Molecularly cloned viruses were indistinguish¬ 
able from wild type and contained the marker mutations and unique 
Bgll and BstX I junction sequences used in the assembly of the infectious 
construct (Yount et al. 2000). 


4 

Assembling MHV Infectious cDNAs 

One potential problem with the original approach was that several “si¬ 
lent” mutations were inserted to introduce the unique Bgll sites into the 
TGEV component clones. To circumvent this problem, a variation of the 
systematic assembly approach was used to build the group II coronavi- 
rus, mouse hepatitis virus (MHV) infectious cDNA (Yount et al. 2002). 
The enzyme Esp3l recognizes an asymmetrical site and cleaves external 
to the recognition sequence, allowing for traditional and “no-see-um” 
cloning applications (Fig. 2, Table 1). With traditional approaches, Esp3l 
sites can be oriented to reform the recognition site after ligation of two 
MHV cDNAs, leaving the restriction site within the genomes of recombi¬ 
nant viruses. However, the Esp3l recognition site is asymmetrical, so a 
simple reverse orientation allows for the insertion of an Esp3\ recogni¬ 
tion sequence on the ends of two adjacent clones with the cleavage site 
derived from virtually any 4-nucleotide sequence combination dictated 
by the virus sequence. On cleavage and ligation with the adjoining frag¬ 
ment, the Esp3l sites are lost from the final ligation products, leaving a 


◄ - 

Fig. 1. Strategy for the systematic assembly of TGEV full-length cDNA. The TGEV 
genome is a positive-sense, single-stranded RNA of about 28.5 kb. Six independent 
subclones (A, Bl, B2, C, DE, and F) that span the entire length of the genome were 
isolated by RT-PCR using primer pairs that introduced unique Notl, Bgll, and/or 
BstX I restriction sites at each end. On ligation, the intact viral genome is generated 
as a cDNA. A unique T7 start site and a 25 poly(T) tail allow for in vitro transcrip¬ 
tion of full-length, capped, polyadenylated transcripts (Yount et al. 2000). PL, pa¬ 
painlike protease; 3CL pro , 3CL protease; GFL, growth factor like; pol, polymerase mo¬ 
tif; MIB, metal binding motif; hel, helicase motif; VD/CD, variable or conserved do¬ 


mains 
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Traditional 


Esp3\ 

5' -CGTCTCN- 3' 

3' -GC AG AG N N N N N- 5* 


5 ' 

3' 


Esp3\ 


Esp3\ | 


MHV A Subclone 

CGTCTCACCTCN 5' -NNNNCGTCTCACCTC 

GCAGAGTGGAGN 3' -NNNNGCAGAGTGGAG 

MHV B Subclone 

Esp3l 1 

L 


3' 

5' 


CGTCTCACCT C 
GCAGAGTGGAG 


5' 

3' 


3"- 1 

No See’m Technology 

\ 


Esp3l J 

t 

J 5' 

MHV A Subclone 

ATCCCTGAGACGNNNNN 5' -NNNNCGTCTCATCCC 

TAGGGACTCTGCNNNNN 3'- NNNNGCAGAGTAGGG 

MHV B Subclone 

5' 

3' 

| Esp3l 

Intact MHV Sequence 

i 

3' 

5' 


MHV A Subclone 

ATCCC 

TAGGG 

MHV B Subclone 



3' 

5' 


Esp3\ Site Lost 


Fig. 2. Use of Esp3l in the traditional and “no-see-um” approaches. The traditional 
approach to the use of Esp3l involves the ligation of two fragments containing iden¬ 
tical Esp3l restriction sites, resulting in a ligation product with an intact Esp3l site 
remaining. In the “no-see-um” approach, a simple reverse orientation of the restric¬ 
tion sites allows for the specific removal of the Esp3l site from the two fragments, 
resulting in a ligation product lacking the engineered restriction site. The use of the 
“no-see-um” technology allows for the assembly of large DNAs from smaller sub¬ 
clones without the incorporation of unique restriction sites into the genome. (Yount 
et al. 2002) 


seamless junction compiled from the exact MHV-A59 sequence. Because 
of this property, unique junctions can be inserted at virtually any posi¬ 
tion between two component clones without mutating the viral genome 
sequence. Additionally, a large number of other restriction enzymes 
share this property (e.g., Sapl, Aarl), expanding the utility of the “no- 
see-um” technology (Table 1). 

During the isolation of the MHV component clones, it was also neces¬ 
sary to remove three preexisting Esp3\ sites located throughout the 
MHV ORF1 sequence (Bonilla et al. 1994). Mutations inserted to ablate 
these sites were used as marker mutations to distinguish molecularly 
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cloned and wild-type virus. We then isolated seven consensus cDNAs 
that spanned the entire length of the MHV-A59 genome in the same 
manner as the TGEV infectious construct (Fig. 3). This was necessary 
because the MHV-A59 genome contains several major regions of se¬ 
quence toxicity in microbial cloning vectors, most of which map be¬ 
tween ~10 and 15 kb in the MHV ORF la/ORF lb polyprotein and an 
unstable region mapping ~5.0 kb in ORF la. As described for the TGEV 
B fragment, cDNAs were isolated after intersecting the toxic domains 
and separating them into independent subclones. However, many sub¬ 
clones were still unstable in traditional PUC-based cloning vectors (e.g., 
pGem, TopoII) even when maintained at low temperature. Consequently, 
we used pSMART cloning vectors (Lucigen), which lack a promoter and 
indicator gene and contain transcriptional and translational terminators 
surrounding the cloning site. Instability appears to be associated with 
expression, as this entire domain (nucleotides 9,555-15,754) is also sta¬ 
ble in yeast vectors (pYES2.1 Topo TA Cloning Kit from Invitrogen) that 
maintain tight regulation over foreign gene expression (Yount et al., un¬ 
published results). Full-length MHV-A59 cDNA was systematically as¬ 
sembled through the simultaneous in vitro ligation of a series of seven 
subgenomic cDNAs (Yount et al. 2002). In the future, it may be possible 
to construct larger subgenomic fragments spanning the entire genome 
by using the pSMART cloning vectors, thereby simplifying the assembly 
strategy, although we have not tested this directly. 

The TGEV and MHV A fragments contain a T7 promoter, whereas the 
TGEV F and MHV G fragments terminate in a poly(T) tract at the 3' 
end, allowing for in vitro T7 transcription of infectious capped, poly- 
adenylated transcripts. The poly(A) tails generated from these tran¬ 
scripts are 25 nucleotides in length, which appears sufficient for tran¬ 
script infectivity. At this time, we do not know the minimal number of 
3' poly(A) residues necessary for transcript infectivity or whether a 5' 
methylated cap is essential. Electroporation of the genomic-length RNAs 
resulted in the production of recombinant MHV virus with growth char¬ 
acteristics identical to those of the wild-type viruses (Yount et al. 2000, 
2002). Importantly, the molecularly cloned viruses contained marker 
mutations engineered into the component clones. Inclusion of nuclocap- 
sid(N)-encoding transcripts enhanced the infectivity of full-length MHV 
and TGEV transcripts. In MHV, N transcripts enhanced the infectivity of 
full-length MHV-A59 transcripts by 10- to 15-fold as evidenced by in¬ 
creased viral antigen expression and virus titers at 25 h postinfection 
(Yount et al. 2002). It is unclear whether MHV N transcripts, N protein, 
or both are essential for increased virus yields after electroporation, or 
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whether this effect would be observed with transcripts encoding unrelat¬ 
ed genes. Coronaviruses have been demonstrated to package low con¬ 
centrations of subgenomic mRNAs, especially N transcripts, and several 
studies have suggested that N may function in transcription and replica¬ 
tion and are tightly associated with the replication complex. With IBV, 
but not TGEV or HCoV-229E, N transcripts are absolutely essential for 
full-length transcript infectivity (Casais et al. 2001). With HCoV-229E, 
other groups have shown that the N gene is not required for subgenomic 
transcription (Thiel et al. 2001). Clearly, additional studies are needed to 
evaluate the role of N protein in RNA transcript infectivity. 

The MHV cDNA cassettes can be ligated systematically as described 
for TGEV or simultaneously. Although numerous incomplete assembly 
intermediates were evident, our demonstration that simultaneous liga¬ 
tion of seven cDNAs will result in full-length cDNA will simplify the 
complexity of the assembly strategy. At this time, there is no evidence to 
indicate that this approach might introduce spurious mutations or ge¬ 
nome rearrangements from aberrant assembly cascades. However, it is 
possible that such variants might arise after RNA transfection, as a con¬ 
sequence of high-frequency MHV RNA recombination between incom¬ 
plete and genome-length transcripts. It is likely that such variants would 
be replication impaired and rapidly out-competed by wild-type virus. A 
second limitation is that the yield of full-length cDNA product is re¬ 
duced, resulting in less robust transfection efficiencies compared with 
the more traditional systematic assembly method. At this time, the 
MHV approach suffers from the large number of component clones (sev¬ 
en), which increase the complexity of the system and reduce the yield of 
full-length cDNA product after in vitro ligation. If the large number of 
toxic domains in the MHV genome is duplicated in other group II coro¬ 
naviruses, this will likely interfere with the development of other infec- 


◄ - 

Fig. 3. Systematic assembly strategy for the construction of MHV-A59 full-length 
cDNA. The MHV genome is a positive-sense, single-stranded RNA of -31.5 kb. Sev¬ 
en independent subclones (A, B , C, D, E , E, and G) that span the entire MHV genome 
were isolated by RT-PCR. Unique Bgll and Esp3l restriction sites, located at the 5' 
and 3' ends of each subclone, were used to assemble a full-length cDNA. A unique 
T7 start site was inserted at the 5' end of the MHV A fragment and a 25 poly(T) tail 
was inserted at the 3' end of the MHV F fragment, allowing for in vitro transcription 
of full-length, capped, poly-adenylated transcripts. Note: Esp3l sites are lost in the 
assembly process. (Yount et al. 2002) 
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tious cDNAs as well. Topics of future research include: (1) Can group II 
coronavirus cDNAs be stabilized as full-length constructs in bacterial ar¬ 
tificial chromosomes or poxvirus vectors as has been reported with 
TGEV, IBV, and HCoV 229E? (2) How does N function to enhance infec- 
tivity of full-length transcripts? (3) How can we enhance yields or the in- 
fectivity of coronavirus infectious cDNAs and transcripts and allow for 
critical review of the consequences of lethal mutations? (4) Can we re¬ 
duce the number of component clones needed to assemble group II co¬ 
ronavirus infectious cDNAs? 


4.1 

Applications in Genomics 

Our assembly strategy for coronavirus infectious constructs is simple 
and straightforward, although the synthesis of full-length transcripts is 
technically challenging. In contrast to infectious clones of other posi¬ 
tive-strand viruses, our TGEV and MHV constructs must be assembled 
de novo and do not exist intact in bacterial or viral vectors. This does 
not restrict the method’s applicability for reverse genetic applications. 
Rather, it allows for rapid genetic manipulation of independent sub¬ 
clones, which minimizes the introduction of spurious mutations else¬ 
where in the genome during recombinant DNA manipulation. Theoreti¬ 
cal limits of our method may exceed several million base pairs of DNA 
and will likely surmount the cloning capacity of bacterial (BAC) and eu¬ 
karyotic artificial chromosome vectors (Grimes and Cooke 1998). Our 
systematic assembly method should also be appropriate for constructing 
full-length infectious clones of other large RNA viruses, including coron- 
aviruses (27-32 kb), toroviruses (24-27 kb), and filoviruses like Mar¬ 
burg (19 kb) (de Vries et al. 1997; Peters et al. 1996). Viral genomes that 
are unstable in prokaryotic vectors can also be cloned by these methods 
(Boyer and Haenni 1994; Rice et al. 1989). Moreover, the technique 
should allow the systematic assembly of full-length infectious dsDNA 
genomes of adenoviruses, herpesviruses, and perhaps other large DNA 
viruses that promise to be powerful tools in vaccination, gene transfer, 
and gene therapy (Smith and Enquist 2000; van Zijl et al. 1988). Recent¬ 
ly, genome sequences from a large number of prokaryotic and eukaryot¬ 
ic organisms have been obtained, providing significant insight into gene 
organization, structure, and function (Cho et al. 1999; Hutchison et al. 
1999) (TIGR homepage http://www.tigr.org). Using this strategy, it may 
be possible to reconstruct a minimal microbial genome from the bottom 
up. However, problems associated with isolating large DNA fragments 
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and the introduction of large DNA genomes into environments that per¬ 
mit replication will likely be significant hurdles. Nevertheless, our as¬ 
sembly strategy may provide a means to analyze the function of large 
blocks of DNA, such as pathogenesis islands, or to engineer chromo¬ 
somes that contain large gene cassettes of interest (Cho et al. 1999). 


4.2 

Engineering MHV Genomes 

Coronaviruses provide a unique system for the incorporation and ex¬ 
pression of one or more foreign genes (Enjuanes and Van der Zeijst 
1995). Coronavirus genes rarely overlap, simplifying the design and ex¬ 
pression of foreign genes from downstream intergenic sequences (IS) 
start sites. Integration of the coronavirus RNA genome into the host cell 
chromosome is unlikely (Lai and Cavanagh 1997). Additionally, recom¬ 
binant viruses or replicon particles could be readily targeted to other 
mucosal surfaces in swine or to other species by simple replacements in 
the S glycoprotein gene, which has been shown to determine tissue- and 
species tropism (Ballesteros et al. 1997; Delmas et al. 1992; Kuo et al. 
2000; Leparc-Goffart et al. 1998; Sanchez et al. 1999; Tresnan et al. 1996). 
Furthermore, coronaviruses infect a number of different species, includ¬ 
ing human, porcine, bovine, canine, and feline, and are available for the 
development of expression systems (Sanchez et al. 1992). Additionally, 
the coronavirus helical ribonucleocapsid structure may further relax the 
packaging constraints of the virus, as compared to icosahedral struc¬ 
tures (Enjuanes and Van der Zeijst 1995; Lai and Cavanagh 1997; Risco 
et al. 1996). Selected questions that remain unanswered include: (1) 
What is the coding capacity of coronavirus based expression systems? 
(2) What is the minimal genome required for efficient replication? (3) 
Can high-titer coronavirus replicon particles be obtained for vaccine ap¬ 
plications? (4) What are the minimal sequence requirements for subge- 
nomic transcription? (5) How many foreign genes can be coordinately 
regulated without impeding virus replication or immunogenicity? (6) 
What are the efficacy, stability, and safety of the recombinant coron¬ 
aviruses in natural settings? Clearly, these vaccine-related topics will 
provide fruitful avenues of investigation over the next decade and will 
greatly enhance our understanding of the mechanics of coronavirus 
transcription, replication, assembly and release, and pathogenesis. 

The future development of vaccines and expression vectors are partic¬ 
ularly intriguing applications of our TGEV and MHV infectious clones. 
Importantly, at least two TGEV downstream ORFs encode luxury func- 
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Fig. 4. Rapid mutagenesis of the MHV infectious cDNA with Class IIS restriction en¬ 
donucleases. Seamless insertion of foreign genes into the coronavirus genome can 
be accomplished with Class IIS restriction enzymes. In this case, a target gene is sys¬ 
tematically removed and replaced by a new gene (new insert). Using a primer with 
overlaps a unique upstream (Site A) restriction site, the upstream arm amplicon is 
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tions (ORF 3a and 3b) that may be deleted from the viral genome 
without impacting infectivity (Curtis et al. 2002; Laude et al. 1990; 
McGoldrick et al. 1999; Wesley et al. 1991). We have developed a rapid 
approach that allows seamless insertion of foreign sequences into virtu¬ 
ally any nucleotide position in the MHV genome, based on class IIS re¬ 
striction endonucleases (Fig. 4). In this approach, flanking sequences 
around the target domain are amplified as separate arms linked by un¬ 
ique class IIS restriction site oriented as described in Fig. 3. A third am- 
plicon encoding the payload sequence of interest is isolated and flanked 
by similar class IIS sites. After restriction digestion and ligation, the for¬ 
eign sequences are inserted into the backbone sequence at any given nu¬ 
cleotide, leaving no evidence of the restriction sites that were used to 
“sew” the new sequences into the MHV backbone. We have successfully 
expressed GFP from the ORF 3a locus of TGEV (Curtis et al. 2002) and 
ORF 4 of MHV (Fig. 5) (manuscript in preparation), demonstrating the 
feasibility of the method and the use of TGEV and MHV as expression 
vectors. In the case with TGEV, GFP expression was stable for at least 
10 passages. In addition, we have removed the ORF 3a and replaced it 
with GP5 of PRRSV to create icTGEV PRRSV GP5 recombinant viruses 
(Curtis KM and Baric RS, unpublished data). Recombinant viruses ex¬ 
pressed the PRRSV GP5 glycoprotein as evidenced by indirect immuno¬ 
fluorescence assay (IFA) and RT-PCR using primer pairs within the 
TGEV leader and PRRSV GP5 gene (data not shown). Recently, expres¬ 
sion of the reporter gene /1-glucuronidase (GUS) and PRRSV ORF 5 
from a TGEV-derived minigenome was demonstrated (Alonso et al. 
2002). Importantly, strong humoral immune responses against GUS and 
PRRSV ORF5 were generated in swine with these vectors, demonstrating 
the feasibility of coronavirus-based vectors for future vaccine develop¬ 
ment. 


◄- 

amplified with a second primer (Site B) containing a Esp3l recognition at the 5' end 
of the nonsense strand of DNA by PCR. A similar approach is used to amplify the 
downstream arm (Site C and D primers). The insert DNA is amplified with primer 
pairs containing compatible C and D Esp3l sites. After PCR amplification and re¬ 
striction digestion, the new insert can be inserted into the viral genome without evi¬ 
dence of the restriction sites used in the assembly cascade. A large number of class 
IIS restriction enzymes greatly enhances the plasticity of the approach 
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Fig. 5a, b. Recombinant MHV-A59 expressing GFR With standard molecular tech¬ 
niques, ORF 4 was removed and the gene encoding GFP inserted downstream of the 
ORF 4 IS (a). DBT cells were infected with wild-type MHV-A59 (A and C) or icMHV- 
A59 GFP (B and D) and subsequently analyzed for CPE by light microscopy (A and 
B) and GFP expression by fluorescent microscopy (C and D) (b) 
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5 

SARS-CoV Infectious Clone 

Rapid response and control of exigent emerging pathogens require an 
approach to quickly generate full-length cDNAs from which molecularly 
cloned viruses are rescued, allowing for genetic manipulation of the ge¬ 
nome. Identification of the first human coronavirus to cause consider¬ 
able morbidity and mortality worldwide provided the first template to 
test the rapidity of our systematic assembly strategy (Drosten et al. 2003; 
Ksiazek et al. 2003). Development of novel vaccine candidates and thera¬ 
peutics requires a better understanding of viral pathogenesis, a process 
greatly facilitated by the availability of an infectious clone. A systematic 
assembly strategy based on the TGEV infectious clone was employed to 
create an infectious construct of the SARS-CoV, within ~2 months of the 
identification and isolation of genomic SARS-CoV RNA (Yount et al. 
2003). Consensus clones were assembled from sibling clones of each 
SARS-CoV fragment by taking advantage of the special properties of 
asymmetric type IIS restriction enzymes. Within 9 weeks, infectious 
clone SARS-CoV was isolated that was phenotypically indistinguishable 
from wild-type SARS-CoV strains. 

The SARS-CoV genome was cloned as six contiguous subclones that 
could be systematically linked by unique Bgll restriction endonuclease 
sites (Fig. 6). Two Bgll junctions were derived from sites encoded within 
the SARS-CoV genome at nt 4,373 (A/B junction) and nt 12,065 (C/D 
junction). A third Bgll site at nt 1,577 was removed, and new Bgll sites 
were inserted by the introduction of silent mutations into the SARS-CoV 
sequence at nt 8,700 (B/C junction), nt 18,916 (D/E junction) and nt 
24,040 (E/F junction). The resulting cDNAs include SARS A (nt 1-4,436), 
SARS B (nt 4,344-8,712), SARS C (nt 8,695-12,070), SARS D (nt 12,055- 
18,924), SARS E (nt 18,907-24,051), and SARS F (nt 24,030-29,736) sub¬ 
clones. The SARS A subclone also contains a T7 promoter, and the SARS 
F subclone terminates in 21Ts, allowing synthesis of capped, polyadenyl- 
ated transcripts. SARS-CoV infectious clone virus was assembled, tran¬ 
scribed and transfected as described previously, and recombinant viruses 
contained the marker mutations inserted into the infectious clone. Re¬ 
combinant viruses produced a mild pneumonia on x-ray in macaques 
similar to wild-type viruses and replicated to similar titers in the mouse 
model (unpublished observation). These data suggest that recombinant 
viruses recapitulated the pathogenesis of wild type in animal models, al¬ 
lowing for the identification of pathogenesis determinants and develop¬ 
ing attenuated viruses as candidate live and killed vaccines. 
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Fig. 6. Systematic assembly strategy for the SARS-CoV infectious clone. The SARS- 
CoV genome is about 30 kb in length and contains ~14 open reading frames (ORFs). 
The predicted functions of the group specific ORFs (ORF 3a/b , ORF 6, ORF 7alby 
ORF 8a/b y ORF 9b) are unknown. Dark gray squares indicate highly conserved con¬ 
sensus sequence sites that function in subgenomic RNA synthesis. Six independent 
subclones (A, B , C, D, £, and F) that span the entire SARS-CoV genome were isolated 
by RT-PCR (genome fragments are not shown to scale). The A fragment spans nt 1- 
4436, the B fragment nt 4344-8712, the C fragment nt 8695-12,070, the D fragment 
nt 12,055-18,924, the E fragment 18,907-24,051, and the F fragment nt 24,030- 
29,736. Unique Bgll restriction sites located at the 5' and 3' ends of each subclone 
were used to assemble a full-length cDNA. A unique T7 start site was inserted at the 
5' end of the SARS-CoV A fragment, and a 21 poly(T) tail was inserted at the 3' end 
of the SARS-CoV G fragment, allowing for in vitro transcription of full-length, 
capped, polyadenylated transcripts 


6 

Future Applications 

The availability of infectious cDNA clones will undoubtedly have a pro¬ 
found effect on the field of coronavirology. These new tools will facilitate 
basic studies and allow for more precise analyses of the molecular mech¬ 
anisms of viral replication, including the definition of RNA elements im¬ 
portant for RNA replication, subgenomic RNA transcription, and ge- 
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nomic RNA packaging. In addition, studies of gene function will be en¬ 
hanced by the availability of infectious cDNA clones by allowing for the 
construction of recombinant viruses and/or replicons containing muta¬ 
tions and the analysis of their effects on viral replication and assembly. 
MHV has long been used as a premiere model to study coronavirus as¬ 
sembly and release, replication, transcription, entry, and pathogenesis. 
The availability of MHV and SARS-CoV infectious cDNA clones will 
complement the existing targeted recombination approaches by provid¬ 
ing a tool for the mutagenesis of the replicase gene, which encode a large 
number of cleavage products that have not been fully characterized. The 
structure and function of the ~20-kb MHV replicase domain will likely 
remain a fertile area of research for the next decade and reveal novel 
protein functions that participate and regulate discontinuous transcrip¬ 
tion and high-frequency RNA recombination. Although large panels of 
reagents are available for analyzing replicase protein expression, pro¬ 
cessing, and subcellular localization, a spectrum of genetically informa¬ 
tive mutations have not been systematically targeted to any of these 
replicase proteins. Given the complexity and size of the coronavirus 
replicase gene, the number of potential mutants that can be generated is 
enormous and will likely require bioinformatic approaches for building 
and testing specific hypotheses. For example, the ORFla C-terminal 
MHV pl5 protein is highly conserved among group I through III coron- 
aviruses and contains a large number of conserved cysteine residues and 
predicted phosphorylation, myristylation, and glycosylation sites (pro¬ 
site, unpublished) (Fig. 7). The original sequence report of pl5 also sug¬ 
gested possible similarities to growth factor-like proteins (Lee et al. 
1991). Recent studies with an IBV homolog suggest that pl5 exists as a 
dimer and accumulates on stimulation with epidermal growth factor, 
providing some evidence that the protein might be involved in the 
growth factor signaling pathway (Ng and Liu 2002). A single amino acid 
mutation has been identified in p 15 of the temperature sensitive mutant, 
LA6, an MHV-A59 mutant with a defect in RNA synthesis at nonpermis- 
sive temperature (Siddell et al. 2001). The availability of infectious cD- 
NAs allows, for the first time, a systematic mutagenesis approach for 
studying the function of specific structural features within this and oth¬ 
er replicase proteins. 

Coupled with the capacity to isolate large panels of mutants in each 
of the replicase proteins, selected questions include: (1) Are each of the 
PLl pro , PL2 pro , and 3CL pro cleavage sites necessary for MHV replication? 

(2) Are the PLl pro , PL2 pro , or 3CL pro proteases essential for replication? 

(3) Are any replicase proteins nonessential? (4) Is replicase gene order 
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Fig. 7. Potential sites of mutagenesis within the C-terminal Orfla pi5 replicase pro¬ 
tein. The MHV pi5 replicase protein is highly conserved among all Coronaviridae 
(hatched domains ), contains several hydrophobic domains ( Hpl-5 ) and several po¬ 
tential sites for myristylation (gray triangles ), and 10 highly conserved cysteine resi¬ 
dues (Cys). Several sites for phosphorylation and glycosylation are predicted with 
prosite analysis, although it is unclear whether pi5 is phosphorylated or glycosylat¬ 
ed 


critical? (5) Are replicase proteins interchangeable between the group 1 
and/or group 2 coronaviruses? (6) How do replication complexes form 
on membranes? (7) What replicase complexes regulate discontinuous 
transcription and synthesis of genome-length and subgenomic-length 
mRNAs and negative-strand RNAs? (8) What are the cis -acting sequence 
elements required for genomic RNA packaging and replication? (9) 
What are the structure-function relationships within and between vari¬ 
ous replicase proteins and/or RNAs? (10) What are the functions of the 
group-specific ORFs, and how do they influence pathogenesis? The next 
decade of research may well be defined as the golden age of coronavirus 
genetics. 
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